Testing Exchangeability On-Line

نویسندگان

  • Vladimir Vovk
  • Ilia Nouretdinov
  • Alexander Gammerman
چکیده

The majority of theoretical work in machine learning is done under the assumption of exchangeability: essentially, it is assumed that the examples are generated from the same probability distribution independently. This paper is concerned with the problem of testing the exchangeability assumption in the on-line mode: examples are observed one by one and the goal is to monitor on-line the strength of evidence against the hypothesis of exchangeability. We introduce the notion of exchangeability martingales, which are online procedures for detecting deviations from exchangeability; in essence, they are betting schemes that never risk bankruptcy and are fair under the hypothesis of exchangeability. Some specific exchangeability martingales are constructed using Transductive Confidence Machine. We report experimental results showing their performance on the USPS benchmark data set of hand-written digits (known to be somewhat heterogeneous); one of them multiplies the initial capital by more than 10; this means that the hypothesis of exchangeability is rejected at the significance level 10−18.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plug-in martingales for testing exchangeability on-line

A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the...

متن کامل

Response to Letter to the Editor by Philip Good on To Permute or Not to Permute

In current practice, such as GWAS (genome-wide association studies), permutation is often applied to multiple testing for association between large number of features [e.g. single nucleotide polymorphisms (SNPs)] and phenotypes (Hahn et al., 2008). Inferring that there is a difference between the phenotypic groups X and Y in some of the features is not very useful. One has to know for which fea...

متن کامل

Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference

Exchangeability is a central notion in statistics and probability theory. The assumption that an infinite sequence of data points is exchangeable is at the core of Bayesian statistics. However, finite exchangeability as a statistical property that renders probabilistic inference tractable is less well-understood. We develop a theory of finite exchangeability and its relation to tractable probab...

متن کامل

Well-Calibrated Predictions from Online Compression Models

It has been shown recently that Transductive Confidence Machine (TCM) is automatically well-calibrated when used in the on-line mode and provided that the data sequence is generated by an exchangeable distribution. In this paper we strengthen this result by relaxing the assumption of exchangeability of the data-generating distribution to the much weaker assumption that the data agrees with a gi...

متن کامل

Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference [Highlight on Published Work]

Exchangeability is a central notion in statistics and probability theory. The assumption that an infinite sequence of data points is exchangeable is at the core of Bayesian statistics. However, finite exchangeability as a statistical property that renders probabilistic inference tractable is less well-understood. We develop a theory of finite exchangeability and its relation to tractable probab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003